A Novel Signal Processing Measure to Identify Exact and Inexact Tandem Repeat Patterns in DNA Sequences
نویسندگان
چکیده
The identification and analysis of repetitive patterns are active areas of biological and computational research. Tandem repeats in telomeres play a role in cancer and hypervariable trinucleotide tandem repeats are linked to over a dozen major neurodegenerative genetic disorders. In this paper, we present an algorithm to identify the exact and inexact repeat patterns in DNA sequences based on orthogonal exactly periodic subspace decomposition technique. Using the new measure our algorithm resolves the problems like whether the repeat pattern is of period P or its multiple (i.e., 2P, 3P, etc.), and several other problems that were present in previous signal-processing-based algorithms. We present an efficient algorithm of O(NL(w) log L(w)), where N is the length of DNA sequence and L(w) is the window length, for identifying repeats. The algorithm operates in two stages. In the first stage, each nucleotide is analyzed separately for periodicity, and in the second stage, the periodic information of each nucleotide is combined together to identify the tandem repeats. Datasets having exact and inexact repeats were taken up for the experimental purpose. The experimental result shows the effectiveness of the approach.
منابع مشابه
New Aspects in Numerical Representations Involved in DNA Repeats Detection
The presence of repeated sequences is a fundamental feature of biological genomes. The detection of tandem repeats is important in biology and medicine as it can be used for phylogenic studies and disease diagnosis. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length. Many ...
متن کاملIdentifying Obscure Periodic Patterns in Genomic DNA Sequence
Genomic DNA sequence is very abundant in periodic patterns, which play important biological roles, such as gene expression, genome structural stabilization, and recombination. Tandem repeat is a type of periodic patterns and concerns several genetic diseases. Tandem repeat finder (TRF) [1] is one of the widely used programs to find tandem repeats without a priori knowledge. Statistical models a...
متن کاملSignal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases
Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...
متن کاملData Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences
Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...
متن کاملS Hort T Andem R Epeats D Etection in Dna S Equences Using
Identification of the short tandem repeats in DNA sequences is a challenging problem for the scientists and engineers in the current era. The detection of the short tandem repeats is also an important part of gene annotation and also it is useful to identify the various hereditary diseases and human identity, etc. The several methods have been developed to find the short tandem repeats, and the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2007 شماره
صفحات -
تاریخ انتشار 2007